Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement prism -> sorbet conversion for multi-statement programs #28

Merged

Conversation

egiurleo
Copy link

Sorbet constructs slightly different ASTs depending on whether a program contains one statement or more than one statements. Correctly parsing programs with more than one statement will make it easier to benchmark this project.

Motivation

Sorbet constructs slightly different ASTs depending on whether a program contains one statement or more than one statements. Correctly parsing programs with more than one statement will make it easier to benchmark this project.

Test plan

Added automated tests for parsing a multi-statement program.

@egiurleo egiurleo force-pushed the emily/parse-multi-statement branch 2 times, most recently from 826a67d to c0e0cb2 Compare June 14, 2024 20:51
Sorbet constructs slightly different ASTs depending on whether a program contains
one statement or more than one statements. Correctly parsing programs with more
than one statement will make it easier to benchmark this project.
@egiurleo egiurleo force-pushed the emily/parse-multi-statement branch from c0e0cb2 to 2e863c2 Compare June 14, 2024 20:51
Comment on lines +176 to +197
pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);
pm_statements_node *stmts = programNode->statements;

auto size = stmts->body.size;

// For a single statement, do not create a Begin node and just return the statement
if (size == 1) {
return convertPrismToSorbet((pm_node *)stmts->body.nodes[0], parser, gs);
}

// For multiple statements, convert each statement and add them to the body of a Begin node
parser::NodeVec sorbetStmts;

for (int i = 0; i < stmts->body.size; i++) {
pm_node_t *node = stmts->body.nodes[i];
unique_ptr<parser::Node> convertedStmt = convertPrismToSorbet(node, parser, gs);
sorbetStmts.emplace_back(std::move(convertedStmt));
}

auto *loc = &programNode->base.location;

return make_unique<parser::Begin>(locOffset(loc, parser), std::move(sorbetStmts));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love some advice on the implementation here -- I decided to handle all the statement logic in the program node case because Sorbet doesn't have a representation of statement nodes, it just stores them as a NodeVec (vector of nodes) in the body of a Begin node, which is the sorbet equivalent of program.

Probably not a huge deal because this is still a prototype but I'm trying to learn how to do things in C++ 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine if you think adding a node will require too many changes.

My C++ comment would be to iterate using a range instead:
for (auto node : stmts->body.nodes) {}. It's cleaner and prevents range bugs. This version calls the copy constructor for node creation which may be inefficient and typed differently depending on what you need it for.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really good to know you can do that in C++! I actually can't iterate over a pm_node_list this way because it doesn't implement the begin function (I get the error Invalid range expression of type 'struct pm_node **'; no viable 'begin' function available). I can look into adding that to the Prism API, but for now I think this is the only way to iterate.

@egiurleo egiurleo marked this pull request as ready for review June 14, 2024 20:53
@egiurleo egiurleo self-assigned this Jun 14, 2024
Comment on lines +176 to +197
pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);
pm_statements_node *stmts = programNode->statements;

auto size = stmts->body.size;

// For a single statement, do not create a Begin node and just return the statement
if (size == 1) {
return convertPrismToSorbet((pm_node *)stmts->body.nodes[0], parser, gs);
}

// For multiple statements, convert each statement and add them to the body of a Begin node
parser::NodeVec sorbetStmts;

for (int i = 0; i < stmts->body.size; i++) {
pm_node_t *node = stmts->body.nodes[i];
unique_ptr<parser::Node> convertedStmt = convertPrismToSorbet(node, parser, gs);
sorbetStmts.emplace_back(std::move(convertedStmt));
}

auto *loc = &programNode->base.location;

return make_unique<parser::Begin>(locOffset(loc, parser), std::move(sorbetStmts));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine if you think adding a node will require too many changes.

My C++ comment would be to iterate using a range instead:
for (auto node : stmts->body.nodes) {}. It's cleaner and prevents range bugs. This version calls the copy constructor for node creation which may be inefficient and typed differently depending on what you need it for.

Comment on lines +176 to +179
pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);
pm_statements_node *stmts = programNode->statements;

auto size = stmts->body.size;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this code a bit by wraping this raw C pointer and size into a C++ std::span. It's like a vector in that it'll let you use C++-style foreach loops, but it doesn't copy/own/free the buffer.

Suggested change
pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);
pm_statements_node *stmts = programNode->statements;
auto size = stmts->body.size;
pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);
pm_statements_node *stmts = programNode->statements;
std::span<pm_node_t *> nodes(stmts->body.nodes, stmts->body.size);

Then you can:

if (nodes.size() == 1) {
    return convertPrismToSorbet(nodes[0], parser, gs);
}
for (auto node : nodes) {
    unique_ptr<parser::Node> convertedStmt = convertPrismToSorbet(node, parser, gs);
    sorbetStmts.emplace_back(std::move(convertedStmt));
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aw man, we're actually using C++17, which doesn't implement span 😭

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a real bummer, but luckily, we have absl::span, which is already used a fair bit throughout the codebase! 🥳

#34

@egiurleo egiurleo merged commit 2b83f0e into proj-parsing-w-prism-in-sorbet Jul 8, 2024
1 check passed
@egiurleo egiurleo deleted the emily/parse-multi-statement branch July 8, 2024 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants